74 research outputs found
Neural Machine Translation of Logographic Languages Using Sub-character Level Information
Recent neural machine translation (NMT) systems have been greatly improved by
encoder-decoder models with attention mechanisms and sub-word units. However,
important differences between languages with logographic and alphabetic writing
systems have long been overlooked. This study focuses on these differences and
uses a simple approach to improve the performance of NMT systems utilizing
decomposed sub-character level information for logographic languages. Our
results indicate that our approach not only improves the translation
capabilities of NMT systems between Chinese and English, but also further
improves NMT systems between Chinese and Japanese, because it utilizes the
shared information brought by similar sub-character units.Comment: WMT 2018 (regular paper); 9 page
The Rule of Three: Abstractive Text Summarization in Three Bullet Points
Neural network-based approaches have become widespread for abstractive text
summarization. Though previously proposed models for abstractive text
summarization addressed the problem of repetition of the same contents in the
summary, they did not explicitly consider its information structure. One of the
reasons these previous models failed to account for information structure in
the generated summary is that standard datasets include summaries of variable
lengths, resulting in problems in analyzing information flow, specifically, the
manner in which the first sentence is related to the following sentences.
Therefore, we use a dataset containing summaries with only three bullet points,
and propose a neural network-based abstractive summarization model that
considers the information structures of the generated summaries. Our
experimental results show that the information structure of a summary can be
controlled, thus improving the performance of the overall summarization.Comment: 9 pages; PACLIC 201
Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection
It is known that a deep neural network model pre-trained with large-scale
data greatly improves the accuracy of various tasks, especially when there are
resource constraints. However, the information needed to solve a given task can
vary, and simply using the output of the final layer is not necessarily
sufficient. Moreover, to our knowledge, exploiting large language
representation models to detect grammatical errors has not yet been studied. In
this work, we investigate the effect of utilizing information not only from the
final layer but also from intermediate layers of a pre-trained language
representation model to detect grammatical errors. We propose a multi-head
multi-layer attention model that determines the appropriate layers in
Bidirectional Encoder Representation from Transformers (BERT). The proposed
method achieved the best scores on three datasets for grammatical error
detection tasks, outperforming the current state-of-the-art method by 6.0
points on FCE, 8.2 points on CoNLL14, and 12.2 points on JFLEG in terms of
F_0.5. We also demonstrate that by using multi-head multi-layer attention, our
model can exploit a broader range of information for each token in a sentence
than a model that uses only the final layer's information.Comment: 12 pages; CICLing 201
Sparse Named Entity Classification using Factorization Machines
Named entity classification is the task of classifying text-based elements
into various categories, including places, names, dates, times, and monetary
values. A bottleneck in named entity classification, however, is the data
problem of sparseness, because new named entities continually emerge, making it
rather difficult to maintain a dictionary for named entity classification.
Thus, in this paper, we address the problem of named entity classification
using matrix factorization to overcome the problem of feature sparsity.
Experimental results show that our proposed model, with fewer features and a
smaller size, achieves competitive accuracy to state-of-the-art models.Comment: 4+1 page
Debiasing Word Embeddings Improves Multimodal Machine Translation
In recent years, pretrained word embeddings have proved useful for multimodal
neural machine translation (NMT) models to address the shortage of available
datasets. However, the integration of pretrained word embeddings has not yet
been explored extensively. Further, pretrained word embeddings in high
dimensional spaces have been reported to suffer from the hubness problem.
Although some debiasing techniques have been proposed to address this problem
for other natural language processing tasks, they have seldom been studied for
multimodal NMT models. In this study, we examine various kinds of word
embeddings and introduce two debiasing techniques for three multimodal NMT
models and two language pairs -- English-German translation and English-French
translation. With our optimal settings, the overall performance of multimodal
models was improved by up to +1.93 BLEU and +2.02 METEOR for English-German
translation and +1.73 BLEU and +0.95 METEOR for English-French translation.Comment: 11 pages; MT Summit 2019 (camera ready
Long Short-Term Memory for Japanese Word Segmentation
This study presents a Long Short-Term Memory (LSTM) neural network approach
to Japanese word segmentation (JWS). Previous studies on Chinese word
segmentation (CWS) succeeded in using recurrent neural networks such as LSTM
and gated recurrent units (GRU). However, in contrast to Chinese, Japanese
includes several character types, such as hiragana, katakana, and kanji, that
produce orthographic variations and increase the difficulty of word
segmentation. Additionally, it is important for JWS tasks to consider a global
context, and yet traditional JWS approaches rely on local features. In order to
address this problem, this study proposes employing an LSTM-based approach to
JWS. The experimental results indicate that the proposed model achieves
state-of-the-art accuracy with respect to various Japanese corpora.Comment: 10 pages; PACLIC 201
Japanese Sentiment Classification using a Tree-Structured Long Short-Term Memory with Attention
Previous approaches to training syntax-based sentiment classification models
required phrase-level annotated corpora, which are not readily available in
many languages other than English. Thus, we propose the use of tree-structured
Long Short-Term Memory with an attention mechanism that pays attention to each
subtree of the parse tree. Experimental results indicate that our model
achieves the state-of-the-art performance in a Japanese sentiment
classification task.Comment: 10 pages; PACLIC 201
Multi-task Learning for Japanese Predicate Argument Structure Analysis
An event-noun is a noun that has an argument structure similar to a
predicate. Recent works, including those considered state-of-the-art, ignore
event-nouns or build a single model for solving both Japanese predicate
argument structure analysis (PASA) and event-noun argument structure analysis
(ENASA). However, because there are interactions between predicates and
event-nouns, it is not sufficient to target only predicates. To address this
problem, we present a multi-task learning method for PASA and ENASA. Our
multi-task models improved the performance of both tasks compared to a
single-task model by sharing knowledge from each task. Moreover, in PASA, our
models achieved state-of-the-art results in overall F1 scores on the NAIST Text
Corpus. In addition, this is the first work to employ neural networks in ENASA.Comment: 10 pages; NAACL 201
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings
One of the most important problems in machine translation (MT) evaluation is
to evaluate the similarity between translation hypotheses with different
surface forms from the reference, especially at the segment level. We propose
to use word embeddings to perform word alignment for segment-level MT
evaluation. We performed experiments with three types of alignment methods
using word embeddings. We evaluated our proposed methods with various
translation datasets. Experimental results show that our proposed methods
outperform previous word embeddings-based methods.Comment: 5 page
English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor
Neural machine translation (NMT) has recently become popular in the field of
machine translation. However, NMT suffers from the problem of repeating or
missing words in the translation. To address this problem, Tu et al. (2017)
proposed an encoder-decoder-reconstructor framework for NMT using
back-translation. In this method, they selected the best forward translation
model in the same manner as Bahdanau et al. (2015), and then trained a
bi-directional translation model as fine-tuning. Their experiments show that it
offers significant improvement in BLEU scores in Chinese-English translation
task. We confirm that our re-implementation also shows the same tendency and
alleviates the problem of repeating and missing words in the translation on a
English-Japanese task too. In addition, we evaluate the effectiveness of
pre-training by comparing it with a jointly-trained model of forward
translation and back-translation.Comment: 8 page
- …